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1  Forward 


The  objective  of  the  project  is  to  develop  a  general  framework  for  value  driven  decentralized  information 
processing.  Different  from  many  existing  approaches  that  attempt  to  identify  a  single  unifying  information 
metric  for  network  inference,  our  goal  is  to  develop  a  general  framework  that  is  applicable  to  various  infor¬ 
mation  value  metrics  as  called  for  by  different  inference  tasks.  The  motivation  is  that  while  a  plethora  of 
information  metrics  are  shown  to  be  relevant  to  various  inference  problems  and  are  themselves  connected 
in  an  intimate  manner,  these  metrics  are  not  interchangeable.  Each  of  them  naturally  arises  in  particular 
inference  problems  and  the  effort  of  identifying  a  single  metric  that  applies  to  all  inference  problems  has  so 
far  been  fruitless. 

A  key  feature  that  is  common  to  the  numerous  research  problems  addressed  under  this  effort  is  the  impact 
of  practical  constraint  in  a  decentralized  inference  system  that  may  render  information  loss  inevitable.  In 
such  situations,  the  question  of  how  to  design  a  inference  network  of  arbitrary  topology,  both  in  the  local 
processing  as  well  as  in  the  information  and  data  flow  is  not  clear. 

This  project  took  a  two-pronged  approach  in  attempting  to  address  value  driven  inference  over  general 
networks. 

1.  For  the  classical  networks,  including  both  the  tandem  networks  and  parallel  networks,  we  investigate 
a  number  of  inference  problems  that  arc  both  challenging  and  significant  on  their  own  rights,  but  arc 
also  informative  as  to  how  the  results  may  translate  into  inference  problems  over  general  networks. 
Specific  problems  under  this  thrust  include  sufficiency  principle  based  data  reduction  for  paralell 
networks  under  quantization  constraint;  quantizer  design  for  decentralized  estimation  over  parallel 
networks;  the  role  of  interation  and  information  exchange  in  tandem  and  parallel  networks;  the  optimal 
information  flow  over  a  tandom  networks  for  general  inference  problems;  and  the  applicability  of 
Wyner’s  common  information  in  various  decentralized  systems  when  rate  constraints  lead  to  inevitable 
information  loss. 

2.  For  networks  of  general  topology,  inference  problems  arc  notoriously  challenging  because  of  the 
nested  nature  of  information  flow.  Additionally,  practical  constraints  such  as  finite  bit  information 
exchange  is  much  more  difficult  to  handle  compared  with  that  of  simple  networks.  For  this  paid,  we 
attempt  to  strive  for  a  deep  understanding  of  the  classical  network  consensus  problem,  both  algorith¬ 
mically  as  well  as  in  convergence  and  consensus  error  performance.  We  focus  our  attention  on  the 
ADMM  (alternating  direction  method  of  multipliers),  both  due  to  its  fast  convergence  and  its  amica¬ 
bility  for  decentralized  implementation.  Realizing  that  many  inference  problems  can  be  formulated  as 
a  consensus  reaching  problem  (e.g.,  a  decentralized  detection  can  be  formulated  as  finding  consensus 
on  log  likelihood  ratio),  the  analysis  will  shed  light  on  both  the  potential  approaches  and  their  perfor¬ 
mance  for  inference  over  general  networks.  Decentralized  hypothesis  testing  over  general  networks 
is  studied  to  illustrate  how  network  consensus  with  quantization  constraint  can  be  helpful  in  attaining 
optimal  error  exponents. 
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While  both  thrusts  have  led  to  important  research  results,  our  ultimate  goal  is  to  provide  a  sound  ap¬ 
proach  to  the  study  of  inference  over  general  networks  with  arbitrary  topology.  The  issues  identified  and 
investigated  using  the  classical  and  well  structured  network,  such  as  the  role  of  interactive  fusion  and  the  op¬ 
timal  information  flow  over  those  simple  networks,  have  helped  inform  our  approach  in  addressing  inference 
over  arbitrarily  connected  networks. 

Over  the  course  of  the  project,  a  total  of  five  doctoral  students  have  worked  on  research  problems  related 
to  this  project.  One  of  them  graduated  in  late  2013  and  has  since  joined  Nuance  Communications.  Four  other 
doctoral  students  arc  expected  to  graduate  within  a  year.  Six  archival  journal  papers  have  been  published 
with  three  more  that  arc  currently  under  review/preparation. 

The  PI  is  truly  indebted  to  Dr.  Liyi  Dai  for  his  continued  support  of  this  effort.  His  engaging  discussions 
of  various  technical  aspects  of  the  research  effort  throughout  the  project  period  as  well  as  his  effort  in 
reducing  the  administrative  overhead  in  terms  of  meetings/presentations  on  the  paid  of  PI  have  made  this 
project  a  truly  pleasant  experience. 

2  Statement  of  Problem  Studied 

Inference  over  networks  has  received  attention  from  various  research  communities  over  the  past  few  decades. 
While  classical  surveillance  applications  involving  physical  sensor  networks  have  been  the  major  impetus 
in  this  research  area,  emerging  applications  involving  both  physical  and  virtual  networks  have  significantly 
broadened  the  scope  and  applications  of  network  inference  research. 

The  research  project  investigate  a  number  of  research  problems  arising  in  various  application  domains. 
While  the  problems  themselves  arc  diverse  in  nature,  a  common  thread  is  the  following:  while  it  is  desirable 
to  attain  inference  performance  that  is  lossless  in  the  sense  that  decentralized  inference  achieves  the  same 
performance,  as  measured  by  suitable  information  metrics,  as  one  where  a  super-genius  has  centralized 
access  to  the  data  in  the  network  as  well  as  unlimited  computing  power,  practical  constraints  often  mandate 
inevitable  information  loss.  The  challenge  is  to  design  a  inference  network,  both  in  processing  and  in 
information  flow  such  that  some  optimal  performance  can  be  attained  in  the  sense  that  information  loss 
due  to  various  constraints  is  kept  at  minimum.  Of  particular  interest  is  the  so-called  quantization  constraint 
where  information  exchange  is  often  inevitably  lossy  dictated  by  the  data  processing  inequality.  We  list 
below  specific  research  problems  undertaken  under  the  auspices  of  this  award. 

•  The  sufficiency  principle  is  a  guiding  principle  for  data  reduction  for  statistical  inference.  In  a  decen¬ 
tralized  system  and  when  nodes  arc  subject  to  quantization  constraint,  there  is  a  need  to  develop  a  new 
framework  for  data  reduction,  especially  when  data  dependence  is  present. 

•  For  a  classical  two-node  tandem  network,  one  may  conjecture  that  information  exchange  in  the  form 
of  the  so-called  interactive  fusion  may  recoup  the  information  loss  due  to  quantization.  We  show  that 
this  is  possible  only  under  certain  setting.  Similarly,  in  a  parallel  network  with  asynchronous  trans- 
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missions,  overhearing  other  nodes’  information  may  or  may  not  help  with  the  inference  performance 
depending  on  the  data  model. 

•  The  design  of  information  flow  over  linear  networks  is  studied  with  the  goal  of  optimizing  the  informa¬ 
tion  value  at  the  terminal  node  whose  choice  is  itself  a  design  problem.  This  problem  has  been  studied 
under  the  name  of  communication  direction  for  a  two-node  network  where  contradictory  results  have 
appeared  in  the  literature.  Thus  there  is  a  need  to  reconcile  the  difference  for  a  clear  understanding  of 
how  information  flow  may  impact  the  obtained  information  value,  which  is  a  proxy  for  the  inference 
performance. 

•  Decentralized  estimation  with  identical  quantizers  in  a  parallel  network  is  practically  attractable  in 
its  simplicity.  We  study  conditions  under  which  it  is  also  theoretically  optimal.  In  addition,  with 
dependent  observations,  how  data  correlation  may  impact  the  estimation  performance  is  studied  using 
the  Gaussian  observation  model. 

•  Network  consensus  problems  when  nodes  arc  subject  to  quantization  are  studied  using  the  alternate 
direction  method  of  mutliplier  (ADMM)  approach.  Our  emphasis  is  the  analysis  of  convergence  and 
consensus  performance  under  various  practical  constraints,  e.g.,  bounded  quantizers. 

•  Decentralized  hypothesis  testing  in  general  networks  is  studied  where  the  goal  is  to  show  whether  or 
not  the  optimal  asymptotic  performance  can  be  attained  when  nodes  arc  subject  to  bounded  quantiza¬ 
tion  while  the  data  themselves  may  be  unbounded. 


3  Summary  of  the  Most  Important  Results 

In  the  following,  we  only  summarize  the  most  significant  results  that  have  the  potential  to  impact  future 
research  direction  in  a  meaningful  way. 

3.1  Interactive  Fusion 

Existing  literature  in  information  fusion  almost  exclusively  assumes  a  static  setting  in  information  flow: 
nodes  propagate  information  on  a  directed  graph  (often  in  the  form  of  a  parallel,  tandem,  or  tree  network) 
and  no  interaction  is  assumed  or  allowed  between  nodes.  We  have  instead  taken  a  more  holistic  approach 
on  information  fusion  where  node  interaction  is  allowed  in  that  communications  may  occur  in  an  interactive 
manner.  Illustrated  in  Fig.  1  is  the  contrast  between  a  static  fusion  system  and  an  interactive  one  with  a  two- 
node  tandem  network.  Note  this  differs  from  the  traditional  study  of  feedback  in  tree  structure  information 
fusion  as  we  do  not  limit  the  number  of  rounds  of  interaction  and  do  not  restrict  it  to  only  between  fusion 
center  and  peripheral  nodes. 

We  established  that  [1],  with  conditional  independent  observations,  while  interactive  fusion  may  strictly 
improve  detection  performance  in  the  finite  sample  regime,  it  has  no  improvement  over  the  static  tandem 
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Figure  1 :  Two  node  tandem  network  with  (a)  static  fusion  and  (b)  interactive  fusion. 

fusion  system  for  the  large  sample  regime.  The  optimum  error  exponent,  namely  the  Kullback-Leibler 
distance,  remains  the  same  for  both  system.  Flowever,  with  conditionally  dependent  observations,  strict 
performance  improvement  in  both  finite-sample  and  asymptotic  regimes  are  possible. 

The  study  of  interactive  fusion  is  based  on  a  simple  but  elegant  result  regarding  the  optimal  decision 
structure  for  general  inference  problems  with  convex  or  affine  objective  functions.  This  simple  result  has 
broader  applications  to  inference  problems  that  are  beyond  the  specific  problem  of  interactive  fusion.  For 
example,  one  can  establish  that  for  the  general  tandem  fusion  system,  communication  direction  should 
always  be  in  favor  of  the  sensor  with  high  SNR,  i.e.,  it  should  serve  as  the  fusion  center  [2]. 

This  interactive  fusion  framework  can  be  applied  to  various  different  fusion  systems.  In  particular,  we 
have  studied  the  simple  scheme  of  sensor  overhearing  in  a  simple  parallel  fusion  system  where  similar 
results  have  been  established  that  contrast  the  system  performance  with  overhearing  to  that  of  independent 
processing  at  all  peripheral  nodes  [3], 

3.2  Data  Reduction  with  Quantization  Constraint 

The  sufficiency  principle  acts  as  a  guiding  principle  for  data  reduction  in  statistical  inference.  A  sufficient 
statistic  is  a  function  of  the  data,  chosen  so  that  it  ‘should  summarize  the  whole  of  the  relevant  information 
supplied  by  the  sample.  In  decentralized  settings,  a  sufficient  statistic  defined  with  respect  to  local  data  is 
referred  to  as  a  local  sufficient  statistic;  if  a  collection  of  local  statistics  form  a  global  sufficient  statistic, 
they  are  said  to  be  globally  sufficient.  While  sufficiency  based  data  reduction  ensures  no  loss  of  inference 
performance  using  the  reduced  data,  communicating  a  one-dimensional  real  data  may  still  be  infeasible 
when  communication  is  subject  to  a  finite  capacity  constraint.  A  question  then  arises  that  if  each  node  in  a 
decentralized  inference  system  has  to  summarize  its  data  using  a  finite  number  of  bits,  is  it  still  optimal  to 
implement  data  reduction  using  global  sufficient  statistics  prior  to  quantization ? 

This  is  illustrated  in  Fig.  2  for  a  two-node  parallel  system.  Observations  X\  and  X2  (each  of  them  is  a 
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Figure  2:  Data  reduction  in  a  two  node  parallel  system  with  quantization  constraint. 

vector  observation  of  potentially  high  dimension)  are  subject  to  quantization  constraint  prior  to  been  sent  to 
the  fusion  center.  With  conditionally  independent  observations,  i.e., 

p(xi,x2\0)  =  p{xi\6)p(x2\0) 

one  can  establish  that  sufficiency  driven  data  reduction  (i.e.,  summarizing  2Q  using  a  sufficient  statistic 
T{Xi))  is  still  optimal  even  in  the  presence  of  quantizers.  However,  with  dependent  observations,  the  answer 
is  unfortunately  no,  and  a  simple  example  is  given  in  [4]  that  shows  globally  sufficiency  does  not  guarantee 
optimal  data  reduction  in  the  presence  of  finite-bit  quantization  which  leads  inevitably  to  information  loss. 

Within  the  class  of  conditionally  dependent  observations,  we  have  identified  in  [4]  that  there  exist  cases 
where  quantizing  local  sufficient  statistics  is  structurally  optimal.  Using  a  simple  two  node  system  as  an 
illustration,  when  Xi  and  X2  are  conditionally  dependent  and  6  is  the  underlying  parameter  of  inference 
interest,  a  hidden  variable  W  can  be  introduced  to  induce  the  following  Markov  chains  hold 

Xi  -  W-X2, 

6-  W-  (Xi,X2). 

Within  this  hierarchical  conditional  independence  model,  first  introduced  in  [5]„  if  Xj(Xi)  and  T2(X2)  are 
local  statistics  that  are  sufficient  with  respect  to  W,  quantizing  Tj(Xi)  and  T2(X2)  at  the  respective  sensor 
is  structurally  optimal  for  the  decentralized  inference  problem.  This  new  framework  of  decentralized  data 
reduction  with  quantization  constraints  has  broad  applications  to  numerous  inference  problems  involving 
networks  of  sensors  and  warrants  further  studies  under  more  general  network  settings. 

3.3  Network  Consensus  and  Quantized  ADMM 

There  have  been  very  limited  algorithms  for  distributed  optimization  with  the  quantized  communication 
constraint.  Existing  quantized  algorithms  are  developed  based  on  the  subgradient  and  only  guarantee  to 
reach  a  neighborhood  of  the  optimal  value  at  a  sublinear  rate  with  the  error  increasing  in  the  size  of  the 
network.  Recently  an  ADMM  based  quantized  algorithm,  referred  to  as  the  quantized  consensus  ADMM, 
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(QC-ADMM),  has  been  proposed  in  [7].  A  more  general  result  has  subsequently  been  obtained  [6]  that 
primarily  solves  the  distributed  optimization  problem  of  the  following  form 

N 

argrnin^  /;(x), 

1=1 

where  /,.  :  RA/  — >  M  is  the  local  objective  function,  using  only  local  computation  and  quantized  communi¬ 
cation. 

The  advantage  of  the  proposed  algorithm  is  that,  when  certain  convexity  assumptions  are  satisfied,  all 
xIq-  converge  to  the  same  quantization  point  within  log ,  +rj  ft  iterations,  where  rj  >  0  depends  on  the  local 
objectives  and  the  network  topology,  and  ft  is  a  polynomial  fraction  decided  by  the  quantization  resolution, 
the  distance  between  initial  and  optimal  variable  values,  the  local  objective  functions  and  the  network  topol¬ 
ogy.  Furthermore,  the  consensus  error  does  not  depend  on  the  size  of  the  network  and  is  usually  smaller 
than  the  error  of  existing  quantized  algorithms. 

While  the  above  algorithm  is  readily  applied  to  distributed  averaging  as  it  is  equivalent  to  a  least-squares 
minimization  problem,  we  notice  that  the  QC-ADMM  does  not  converge  uniquely.  For  locally  convergent 
algorithms,  it  is  well-known  that  a  good  stalling  point  usually  helps.  Based  on  this  fact,  [7]  proposed  a  two- 
stage  method  which  first  uses  the  ADMM  with  dithered  quantization  to  obtain  a  good  stalling  point  and  then 
employs  the  QC-ADMM  to  reach  a  consensus.  Simulations  show  that  the  consensus  error  of  this  two-stage 
approach  is  typically  less  than  one  quantization  resolution  for  all  connected  networks  where  agents’  data 
can  be  of  arbitrary  magnitudes. 

This  line  of  work  can  be  readily  extended  to  include  cases  of  more  practical  significance.  For  exam¬ 
ple,  with  the  rounding  quantizer  assumed  above,  one  still  need  infinite  quantization  levels  (hence  infinite 
bits  for  representation)  when  the  input  is  unbounded.  An  extension  to  bounded  quantizers  with  unbounded 
input  has  been  proposed  in  [8]  where  similar  convergence  results  have  been  established.  Perhaps  more  im¬ 
portantly  is  the  fact  that  the  proposed  network  consensus  approach  is  much  more  useful  in  solving  some 
network  inference  problem.  An  example  is  decentralized  hypothesis  testing  over  a  network  of  general  but 
connected  topology  and  it  is  established  in  [9]  that  the  developed  consensus  approach  allows  a  decentralized 
approach  to  achieve  the  optimal  error  exponent  of  the  centralized  counterpart,  a  conclusion  that  is  signifi¬ 
cantly  stronger  than  existing  results  in  the  literature. 
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